智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Qiyue Yin , Tongtong Yu , Shengqi Shen , Jun Yang , Meijing Zhao , Kaiqi Huang , Bin Liang , Liang Wang

分类：机器学习 | 人工智能

2022-12-01

With the breakthrough of AlphaGo, deep reinforcement learning becomes a recognized technique for solving sequential decision-making problems. Despite its reputation, data inefficiency caused by its trial and error learning mechanism makes deep reinforcement learning hard to be practical in a wide range of areas. Plenty of methods have been developed for sample efficient deep reinforcement learning, such as environment modeling, experience transfer, and distributed modifications, amongst which, distributed deep reinforcement learning has shown its potential in various applications, such as human-computer gaming, and intelligent transportation. In this paper, we conclude the state of this exciting field, by comparing the classical distributed deep reinforcement learning methods, and studying important components to achieve efficient distributed learning, covering single player single agent distributed deep reinforcement learning to the most complex multiple players multiple agents distributed deep reinforcement learning. Furthermore, we review recently released toolboxes that help to realize distributed deep reinforcement learning without many modifications of their non-distributed versions. By analyzing their strengths and weaknesses, a multi-player multi-agent distributed deep reinforcement learning toolbox is developed and released, which is further validated on Wargame, a complex environment, showing usability of the proposed toolbox for multiple players and multiple agents distributed deep reinforcement learning under complex games. Finally, we try to point out challenges and future trends, hoping this brief review can provide a guide or a spark for researchers who are interested in distributed deep reinforcement learning.

translated by 谷歌翻译

CWD: A Machine Learning based Approach to Detect Unknown Cloud Workloads

Mohammad Hossain , Derssie Mebratu , Niranjan Hasabnis , Jun Jin , Gaurav Chaudhary , Noah Shen

分类：机器学习

2022-11-28

Workloads in modern cloud data centers are becoming increasingly complex. The number of workloads running in cloud data centers has been growing exponentially for the last few years, and cloud service providers (CSP) have been supporting on-demand services in real-time. Realizing the growing complexity of cloud environment and cloud workloads, hardware vendors such as Intel and AMD are increasingly introducing cloud-specific workload acceleration features in their CPU platforms. These features are typically targeted towards popular and commonly-used cloud workloads. Nonetheless, uncommon, customer-specific workloads (unknown workloads), if their characteristics are different from common workloads (known workloads), may not realize the potential of the underlying platform. To address this problem of realizing the full potential of the underlying platform, we develop a machine learning based technique to characterize, profile and predict workloads running in the cloud environment. Experimental evaluation of our technique demonstrates good prediction performance. We also develop techniques to analyze the performance of the model in a standalone manner.

translated by 谷歌翻译

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Chengzhi Lin , Ancong Wu , Junwei Liang , Jun Zhang , Wenhang Ge , Wei-Shi Zheng , Chunhua Shen

分类：计算机视觉 | 自然语言处理

2022-09-27

视频和文本之间的跨模式检索因网络上的视频迅速出现而越来越多。通常，视频包含丰富的实例和事件信息，查询文本仅描述了信息的一部分。因此，视频可以对应于多个不同的文本说明和查询。我们将此现象称为``视频文本对应歧义''问题。当前技术主要集中于挖掘视频和文本内容之间的本地或多级对齐（\ textit {e.g。}，对实体和动词的动作对象）。这些方法很难通过仅使用一个单个功能来描述视频来减轻视频文本的歧义，这需要同时与多个不同的文本功能匹配。为了解决这个问题，我们提出了一个文本自适应多个视觉原型匹配模型，该模型会自动捕获多个原型，以通过自适应聚合视频令牌功能来描述视频。给定查询文本，相似性由最相似的原型确定，以在视频中找到对应关系，该视频称为文本自适应匹配。为了学习代表视频中丰富信息的多种原型，我们提出了差异损失，以鼓励不同的原型参与视频的不同内容。我们的方法在四个公共视频检索数据集上优于最先进的方法。

translated by 谷歌翻译

Multi-dataset Training of Transformers for Robust Action Recognition

Junwei Liang , Enwei Zhang , Jun Zhang , Chunhua Shen

分类：计算机视觉

2022-09-26

我们研究了可靠的功能表示的任务，旨在在多个数据集上良好地概括以进行行动识别。我们建立了有关变形金刚的功效的方法。尽管在过去的十年中，我们目睹了视频动作识别的巨大进展，但如何培训单个模型可以在多个数据集中表现良好的单一模型仍然充满挑战而有价值。在这里，我们提出了一种新颖的多数据集训练范式，Multitrain，设计了两个新的损失条款，即信息丰富的损失和投射损失，旨在学习稳健的表现以进行行动识别。特别是，信息性损失最大化了功能嵌入的表现力，而每个数据集的投影损失遍历了数据集的类之间的内在关系。我们验证方法对五个具有挑战性的数据集的有效性，即动力学400，动力学700，矩矩，活动网络和某种效果 - v2数据集。广泛的实验结果表明，我们的方法可以始终如一地提高最新性能。

translated by 谷歌翻译

Tensor-Based Multi-Modality Feature Selection and Regression for Alzheimer's Disease Diagnosis

Jun Yu , Zhaoming Kong , Liang Zhan , Li Shen , Lifang He

分类：机器学习 | 计算机视觉

2022-09-23

与大脑变化相关的阿尔茨海默氏病（AD）和轻度认知障碍（MCI）的评估仍然是一项艰巨的任务。最近的研究表明，多模式成像技术的组合可以更好地反映病理特征，并有助于更准确地诊断AD和MCI。在本文中，我们提出了一种新型的基于张量的多模式特征选择和回归方法，用于诊断和生物标志物对正常对照组的AD和MCI鉴定。具体而言，我们利用张量结构来利用多模式数据中固有的高级相关信息，并研究多线性回归模型中的张量级稀疏性。我们使用三种成像方式（VBM- MRI，FDG-PET和AV45-PET）具有疾病严重程度和认知评分的临床参数来分析ADNI数据的方法的实际优势。实验结果表明，我们提出的方法与疾病诊断的最新方法的优越性能以及疾病特异性区域和与模态相关的差异的鉴定。这项工作的代码可在https://github.com/junfish/bios22上公开获得。

translated by 谷歌翻译

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

Jun Gao , Tianchang Shen , Zian Wang , Wenzheng Chen , Kangxue Yin , Daiqing Li , Or Litany , Zan Gojcic , Sanja Fidler

分类：计算机视觉

2022-09-22

随着几个行业正在朝着建模大规模的3D虚拟世界迈进，因此需要根据3D内容的数量，质量和多样性来扩展的内容创建工具的需求变得显而易见。在我们的工作中，我们旨在训练Parterant 3D生成模型，以合成纹理网格，可以通过3D渲染引擎直接消耗，因此立即在下游应用中使用。 3D生成建模的先前工作要么缺少几何细节，因此在它们可以生成的网格拓扑中受到限制，通常不支持纹理，或者在合成过程中使用神经渲染器，这使得它们在常见的3D软件中使用。在这项工作中，我们介绍了GET3D，这是一种生成模型，该模型直接生成具有复杂拓扑，丰富几何细节和高保真纹理的显式纹理3D网格。我们在可区分的表面建模，可区分渲染以及2D生成对抗网络中桥接了最新成功，以从2D图像集合中训练我们的模型。 GET3D能够生成高质量的3D纹理网格，从汽车，椅子，动物，摩托车和人类角色到建筑物，对以前的方法进行了重大改进。

translated by 谷歌翻译

H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

Ming Cheng , Yiling Xu , Wang Shen , M. Salman Asif , Chao Ma , Jun Sun , Zhan Ma

分类：计算机视觉

2022-08-04

高速，高分辨率的立体视频（H2-STEREO）视频使我们能够在细粒度上感知动态3D内容。然而，对商品摄像机的收购H2-STEREO视频仍然具有挑战性。现有的空间超分辨率或时间框架插值方法分别提供了缺乏时间或空间细节的折衷解决方案。为了减轻这个问题，我们提出了一个双摄像头系统，其中一台相机捕获具有丰富空间细节的高空间分辨率低框架速率（HSR-LFR）视频，而另一个摄像头则捕获了低空间分辨率的高架框架-Rate（LSR-HFR）视频带有光滑的时间细节。然后，我们设计了一个学习的信息融合网络（LIFNET），该网络利用跨摄像机冗余，以增强两种相机视图，从而有效地重建H2-STEREO视频。即使在大型差异场景中，我们也利用一个差异网络将时空信息传输到视图上，基于该视图，我们建议使用差异引导的LSR-HFR视图基于差异引导的流量扭曲，并针对HSR-LFR视图进行互补的扭曲。提出了特征域中的多尺度融合方法，以最大程度地减少HSR-LFR视图中闭塞引起的翘曲幽灵和孔。 LIFNET使用YouTube收集的高质量立体视频数据集以端到端的方式进行训练。广泛的实验表明，对于合成数据和摄像头捕获的真实数据，我们的模型均优于现有的最新方法。消融研究探讨了各个方面，包括时空分辨率，摄像头基线，摄像头解理，长/短曝光和应用程序，以充分了解其对潜在应用的能力。

translated by 谷歌翻译

Dynamic Contrastive Distillation for Image-Text Retrieval

Jun Rao , Liang Ding , Shuhan Qi , Meng Fang , Yang Liu , Li Shen , Dacheng Tao

分类：人工智能 | 自然语言处理 | 计算机视觉

2022-07-04

尽管配备的远景和语言预处理（VLP）在过去两年中取得了显着的进展，但它遭受了重大缺点：VLP型号不断增加的尺寸限制了其部署到现实世界的搜索场景（高潜伏期是不可接受的）。为了减轻此问题，我们提出了一种新颖的插件动态对比度蒸馏（DCD）框架，以压缩ITR任务的大型VLP模型。从技术上讲，我们面临以下两个挑战：1）由于GPU内存有限，在处理交叉模式融合功能期间优化了太多的负样本，因此很难直接应用于跨模式任务，因此很难直接应用于跨模式任务。。 2）从不同的硬样品中静态优化学生网络的效率效率低下，这些样本对蒸馏学习和学生网络优化具有不同的影响。我们试图从两点克服这些挑战。首先，为了实现多模式对比度学习并平衡培训成本和效果，我们建议使用教师网络估算学生的困难样本，使学生吸收了预培训的老师的强大知识，并掌握知识来自硬样品。其次，要从硬样品对学习动态，我们提出动态蒸馏以动态学习不同困难的样本，从更好地平衡知识和学生的自学能力的困难的角度。我们成功地将我们提出的DCD策略应用于两个最先进的视觉语言预处理模型，即vilt和仪表。关于MS-Coco和FlickR30K基准测试的广泛实验显示了我们DCD框架的有效性和效率。令人鼓舞的是，与现有的ITR型号相比，我们可以至少加快推断至少129美元的$ \ times $。

translated by 谷歌翻译

Collaborative Navigation and Manipulation of a Cable-towed Load by Multiple Quadrupedal Robots

Chenyu Yang , Guo Ning Sue , Zhongyu Li , Lizhi Yang , Haotian Shen , Yufeng Chi , Akshara Rai , Jun Zeng , Koushil Sreenath

分类：机器人 | 人工智能

2022-06-29

本文解决了机器人的问题，可以协作将电缆带到指定的目标位置，同时避免实时碰撞。引入电缆（与刚性链接相反）使机器人团队能够通过电缆的松弛/拉特开关更改其内在尺寸，从而使机器人团队能够穿越狭窄的空间。但是，这是一个具有挑战性的问题，因为混合模式开关以及多个机器人和负载之间的动态耦合。以前解决此类问题的尝试是离线执行的，并且不考虑避免在线障碍。在本文中，我们介绍了一个级联的计划方案，并采用平行的集中式轨迹优化，涉及混合模式开关。我们还每个机器人开发了一组分散的计划者，这使我们可以解决在线协作负载操作问题的方法。我们开发并演示了第一个能够移动有线电视载荷的首个协作自治框架之一，该框架太重了，无法通过一个机器人移动，通过狭窄空间，具有实时反馈和实验中的反应性计划。

translated by 谷歌翻译